Chapter 11. Writing Common Gateway Interface Programs

The Web is an interactive medium that allows users to give feedback to a company about its products, use search utilities to locate information on a topic, use conversion programs to convert one value to another, and more. The Internet Connection Server software does not perform these tasks. They are performed by external programs using information passed to them by the server. The Common Gateway Interface (CGI) allows the server and the external program to communicate. CGI programs are stored on the CGI-BIN subdirectory.

This chapter discusses what the Common Gateway Interface is, why you might want to use it, and how it works.


Overview of CGI

CGI is a standard, supported by almost all Web servers, that defines how information is exchanged between a Web server and an external program (CGI program).

CGI programs can be written in any language supported by the operating system on which the server is run. The language can be a programming language, like C++, or it can be a scripting language, like Perl or REXX. Programs written in programming languages need to be compiled, and typically run faster than uncompiled programs. On the other hand, those written in scripting languages tend to be easier to write, maintain, and debug.

The functions and tasks that CGI programs can perform range from the simple to the very advanced. In general, those that perform the simple tasks are called CGI scripts (because they are not compiled). Those that perform more complex tasks are often called gateway programs In this chapter, we refer to both types as CGI programs.

Given the wide choice of languages and the variety of functions, the possibilities for CGI programs seem almost endless. How you use them is up to you. Once you understand the CGI specification, you will know how servers pass input to CGI programs and how servers expect output.

There are many uses for CGI programs. Basically, they are designed to handle dynamic information. Dynamic in this context refers to temporary information that is created for a one-time use and not stored anywhere on the Web. This information may be a document, an e-mail message, or the results of a conversion program.

CGI and Dynamic Documents

There are many types of files that exist on the Web. Primarily they fall into one of the following categories:

Servers break HTML documents into two distinct types: Static documents Dynamic documents

Consider the process of "serving" these two types of documents. Responding to requests for static documents is fairly simple. For example, Jill User accesses the Acme Web server to get information on the Pro-Expert gas grill. She clicks on Products, then on Grills, and finally on Pro-Expert. Each time Jill clicks on a link, the Web browser uses the URL attached to the link to request a specific document from the Web server and the server responds by sending a copy of the document to Jill's browser.

What if Jill then decides she wants to search through the information on the Acme Web server for all documents that contain information on Acme grills, such as news articles, press releases, price listings, and service agreements. This is a more difficult request to process. This is not a request for an existing document. Instead, it is a request for a dynamically generated list of documents that meet certain criteria. This is where CGI comes in.

You can use a CGI program to parse the request, search through the documents on your Web server, and create a list with hypertext links to each of the documents that contain the specified word or string.

Uses for CGI

HTML allows you to access resources on the Internet using other protocols by specifying the protocol in the URL. One such protocol is mailto. If you code a link with mailto followed by an e-mail address, the link will result in a generic mail form.

What if you wanted your customers to provide specific information, such as how often they use the Web or how they heard about your company? Rather than using the generic mailto form, you can create a form that asks these questions and more. You can then use a CGI program to interpret the information, include it in an e-mail message, and send it to the appropriate person.

CGI programs are not limited to processing search requests and e-mail. They can be used for a wide variety of purposes. Basically, anytime you want to take input from the reader and generate a response, you can use a CGI program. The input may even be apparent to the reader. For example, many people are interested in how many other people have visited their home page. As a fun way to keep count of this, you can create a CGI program that keeps track of the number of requests for your home page and displays the new total each time someone links to your home page.


The CGI Process

CGI programs are referenced from within HTML documents. In general, the HTML document defines the environment variables that specify how information is passed. When you design the layout of your document, keep in mind how a CGI program that helps the user search for data, set preferences, or add information affects the look your document. Developing the CGI program along with the HTML document will help you avoid many design mistakes.

Overview

The CGI process involves three players: the Web browser, the Web server, and the CGI program. The CGI Sample Test Case exemplifies how CGI programs for online forms work. Lets assume that the Web browser has already requested and obtained the test case, shown below.


Figure 1. Sample Form, Page 1

* Figure icsl0c03 not displayed.


Figure 2. Sample Form, Page 2

* Figure icsl0c04 not displayed.

  1. The user clicks buttons or enters information in fields, and then clicks on the Apply button.

  2. The Web browser then sends the data to the Web server in an encoded format. In our example, the data consists of responses on an HTML form.

  3. Upon receiving the data, the Web server converts the data to a format compliant with the CGI specification for input and sends it to the CGI program.

  4. The CGI program then decodes the data and processes it per its instruction.

  5. This response is sent back to the Web server in a form that is compliant with the CGI specification for output.

  6. The Web server then interprets the response and forwards it to the Web browser.

The HTML source of this form illustrates the various types of fields:


<HTML>
<HEAD>
<TITLE>CGIXMP Test Case</TITLE>
</HEAD>
<BODY>
<H1>CGI Sample Test Case</H1>
Fill in the following fields and press APPLY.
The values you enter will
be read by the CGIXMP.EXE program and displayed in a simple HTML
form which is generated dynamically by the program.
<P> <HR>
<form method=POST action="/cgi-bin/cgixmp">
<P>
<H3>Checkbox Field</H3>
<P>
<PRE>
<input type="checkbox" name="var1" value="123">
Check to set variable VAR1 to 123<BR>
<input type="checkbox" name="var2" value="XyZ" checked>
Check to set variable VAR2 to XyZ<BR>
</PRE>
<P>
<H3>Radio Button Field</H3>
<P>
<PRE>
<input type="radio" name="var3" value="1">
Select to set variable VAR3 to 1<BR>
<input type="radio" name="var3" value="2">
Select to set variable VAR3 to 2<BR>
<input type="radio" name="var3" value="3" checked>
Select to set variable VAR3 to 3<BR>
<input type="radio" name="var3" value="4">
Select to set variable VAR3 to 4<BR>
</PRE>
<P>

<H3>Single selection List Field</H3>
<P>
<PRE>
Select a value for variable VAR4   <select size=1 name="var4">
<option>0<option>1<option>2<option>3
<option>4<option>5</select>
</PRE>
<P>
<H3>Text Entry Field</H3>
<P>
<PRE>
Enter value for variable VAR5 <input type="text" name="var5"
size=20 maxlength=256 value="TEST value">
</PRE>
<P>
<H3>Multiple selection List Field</H3>
<P>
<PRE>
Select a value for variable VAR6
<select multiple size=2 name="var6">
<option>Ford<option>Chevrolet<option>Chrysler<option>
Ferrari<option>Porsche
</select>
</PRE>
<P>
<H3>Password Field</H3>
<P>
<PRE>
Enter Password
<input type="password" name="pword" size=10 maxlength=10>
</PRE>
<P>
<H3>Hidden Field</H3>
<P>
<input type="hidden" name="hidden" value="Text not shown on form...">
<P>
<PRE>
<input type="submit" name="pushbutton" value="Apply">
<input type="reset" name="pushbutton" value="Reset">
<HR>
</PRE>
</BODY>
</HTML>

Sending Information to the Server

When you fill out a form or enter a phrase in a search field and click on the submission button, the Web browser sends the request to the server in a format described as URL-encoded. In URL-encoded information:

Returning Output

When the CGI program is finished, it passes the resulting response to the Web server using standard output (stdout). The Web server interprets the response and sends it to the Web browser.

If the program encounters errors, it writes error information to standard error (stderr). The Internet Connection Server writes the error information to the cgi_error log. See "Logs" for more information on the cgi_error log.

The response contains MIME (Multipurpose Internet Mail Extensions) headers that tell the browser how to display the returned information. The response must contain at least one MIME header (CONTENT-TYPE) and a blank line. The blank line after the MIME headers separates the headers from the content of the response. For example:

HTTP/1.0 200 OK
MIME-Version:1.0
Date: Monday, 25 Oct 95 13:14:15 GMT
CONTENT-TYPE:text/html
DOCUMENT_NAME:c:/tcpip/cgi-bin/form.cmd
Expires: Wednesday, 25 Oct 95 13:14:15 GMT

If the response is a static document, the program returns the URL of the document using the HTTP Location header, followed by a blank line. For example:

Location: http://www.acme.com/products.html
Upon receiving this information from the CGI program, the Web server will retrieve the specified document and send a copy of it to the Web browser.

If the response is a dynamic document, such as a list of hypertext links to documents that meet specified criteria, the program should indicate that the response is an HTML document, using the Content-type header followed by a blank line, and then include links to the documents in HTML format.

Note: If you do not want the Web server to interpret the response, but just to forward it to the Web browser, the name of your CGI program must begin with nph- (no-parse header). A no-parse header program has output that is a complete HTTP response requiring no further action (interpretation or modification) on the part of the server.

If the response is an HTML file, the program should indicate that the response is a HTML file, using the content-type header followed by a blank line, and then the body of the document. For example the HTML output from the CGI Sample Test Case looks like this:

CONTENT-TYPE: text/html
 
<html>
<head>
<title>Test HTML Page</title>
</head>
<body>
<h1>Variable Information</h1>
<hr>
<p>
<pre>Variable "var1" = 123</pre>
<p><pre>Variable "var2" = XyZ</pre>
<p><pre>Variable "var3" = 3</pre>
<p><pre>Variable "var4" = 0</pre>
<p><pre>Variable "var5" = TEST value</pre>
<p><pre>Variable "var6" = Ford</pre>
<p><pre>Variable "pword" = </pre>
<p><pre>Variable "hidden" = Text not shown on form...
</pre>
<p><pre>Variable "pushbutton" = Apply</pre>
<p><pre><p>
<hr>
</body>
</html>

How CGI Programs Work

Most CGI programs include the following three stages:

Parsing

Parsing is the first stage of a CGI program. In this stage, the program takes the data in one or more of the possible formats (environment variables, command-line arguments, or standard input devices), breaks it into components, and decodes the information in the components.

For example, the following could be received using the environment variable QUERY_STRING:

NAME=Eugene+T%2E+Fox&ADDR=etfox%7Cibm.net&INTEREST=R\0C\0O

Parsing breaks the fields at the ampersands and decodes the ASCII hexadecimal characters. The results look like this:

NAME=Eugene T. Fox
ADDR=etfox@ibm.net
INTEREST=RCO
You can use the cgiparse command to automatically parse query strings, read and write CONTENT-LENGTH characters, and count the number of form fields submitted. For a complete description of the cgiparse command, see Chapter 9. "Using Commands".

Data Manipulation

Data manipulation is the second stage of a CGI program. In this stage, the program takes the parsed data and performs the appropriate action. For example, a CGI program designed to process an application form might:

  1. Take the input from the parsing stage

  2. Convert abbreviations into more meaningful information

  3. Plug the information into an e-mail template

  4. Call the sendmail program

  5. Send the filled-in template to a specified e-mail address

Response Generation

Response generation is the final stage of a CGI program. In this stage, the program formulates its response to the Web server, which forwards it to the Web browser. The response contains MIME headers that vary depending on the type of response. With a search, the response might be the URLs of all the documents that met the search criteria. With a request that results in e-mail, the response might be a message confirming that the e-mail was sent.

You can use the cgiutils command to produce full or partial sets of MIME headers. The cgiutils command produces the server header and date field, or you can use flags to keep the date from appearing, display version information, expiration, and more. See Chapter 9. "Using Commands" for a complete listing of cgiutils flags and examples.

The C code from the forms example shows how to write a CGI program.


Figure 4. Example of C Code

/* Includes */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <os2.h>
/* Definition of structure used in linked list of variables */
typedef struct _argument
  {
  char *VariableName;
  char *Value;
  struct _argument *pNext;
  } PARMLIST, *PPARMLIST;
 
/* Function definitions */
int ErrMsg (char *msg);
PPARMLIST ReadArguments(int InputLength);
static void PlusesToSpaces(char *Str);
static int HexVal(char c);
static void TranslateEscapes(char *Str);
/******************************************************************************/
/*                                                                            */
/* Function       : main                                                      */
/*                                                                            */
/* Description    : This is a CGI program which takes the output of a form    */
/*                  submitted with a method of POST, and displays a list      */
/*                  of the variable names and values.                         */
/*                                                                            */
/******************************************************************************/
main(int argc, char *argv[, char *envp])
  {
  char *requestMethod;
  char *contentLength;
  int argLength;
  PPARMLIST pParm = NULL;
  PPARMLIST pHead = NULL;
 
  /* This CGI program must be called with a method of POST */
  requestMethod = getenv("REQUEST_METHOD");
 
  if ((requestMethod == NULL) ||
      (stricmp(requestMethod, "POST")))
    {
    ErrMsg("REQUEST_METHOD environment variable not properly set to POST\n");
    }
  else
    {
    /* Get the length of the arguments passed in to this program */
    contentLength = getenv("CONTENT_LENGTH");
 
    if (contentLength == NULL)
      {
      ErrMsg("CONTENT_LENGTH environment variable not set\n");
      }
    else
      {
      /* Begin output of HTML to display results of CGI program */
      printf("Content-type: text/html\n\n");
      printf("<html>\n");
      printf("<head>\n");
      printf("<title>Sample HTML Page</title>\n");
      printf("</head>\n");
      printf("<body>\n");
      printf("<h1>Variable Information</h1>\n");
      printf("<hr>\n");
      printf("<p>\n");
      /* Read the arguments passed in to this program and place them in */
      /* a singly linked list - one link per variable                   */
      argLength = atoi(contentLength);
      pHead = ReadArguments(argLength);
      pParm = pHead;
 
      /* Output the list of variable names and values */
      while (pParm)
        {
        printf("<pre>Variable \"%s\" = %s</pre><p>",
               pParm->VariableName,
               pParm->Value);
 
        pParm = pParm->pNext;
        }
 
      /* Output the remainder of the HTML used to display the results */
      printf("<pre>");
      printf("<p>\n");
      printf("<hr>\n");
      printf("</body>\n");
      printf("</html>\n");
      }
    }
  }
 
 
/******************************************************************************/
/*                                                                            */
/* Function       : ReadArguments                                             */
/*                                                                            */
/* Description    : Read the arguments from stdin that are supplied           */
/*                  to a CGI program when the method is POST.                 */
/*                  Breaks up the input into
/*                  (Variable, Value) pairs.                                  */
/*                  Handles translating of all the special characters         */
/*                  that HTTP puts into the strings.                          */
/*                                                                            */
/******************************************************************************/
PPARMLIST ReadArguments(int InputLength)
  {
  PPARMLIST pCur= NULL;
  PPARMLIST pHead= NULL;
  PPARMLIST pPrev= NULL;
  char *Input;
  char *pToken;
 
  if (InputLength < 1)
    {
    return(NULL);
    }
  /* Allocate a buffer for the input */
  Input = malloc(InputLength + 1);
 
  if (Input == NULL)
    {
    return(NULL);
    }
 
  /* Read the input */
  gets(Input);
 
  /* Variables are separated by the "&" character */
  pToken = strtok(Input, "&");
 
  while (pToken)
    {
    /* Create and fill in linked list of variable information */
    pCur = malloc(sizeof(PARMLIST));
    pCur->VariableName = pToken;
    pToken = strchr(pToken, '=');
    if (pToken)
      {
      *pToken = '\0';
      pCur->Value = ++pToken;
      PlusesToSpaces( pToken );
      TranslateEscapes( pToken );
      }
    else
      {
      pCur->Value = NULL;
      }
    if (pPrev)
      {
      pPrev->pNext = pCur;
      }
 
    if (!pHead)
      {
      pHead = pCur;
      }
    pPrev = pCur;
    pToken = strtok(NULL, "&");
    }
 
  if (pHead)
    {
    pPrev->pNext = NULL;
    }
  return(pHead);
  }
 
/******************************************************************************/
/*                                                                            */
/* Function       : PlusesToSpaces (STATIC)                                   */
/*                                                                            */
/* Description    : This one's easy. It just translates any '+'               */
/*                  characters found into ' ' characters.                     */
/*                                                                            */
/******************************************************************************/
static void PlusesToSpaces(char *Str)
  {
  if (Str != NULL)
    {
    while (*Str != '\0')
      {
      if (*Str == '+')
        {
        *Str = ' ';
        }
 
      ++Str;
      }
    }
  }
 
/******************************************************************************/
/*                                                                            */
/* Function       : HexVal (STATIC)                                           */
/*                                                                            */
/* Description    : This function returns a number that corresponds           */
/*                  to the value of a character treated as                    */
/*                  a hex digit. Case insensitive. Characters outside         */
/*                  0-9,a-f,A-F have a value of 0.                            */
/*                                                                            */
/******************************************************************************/
static int HexVal(char c)
  {
  int rc;
 
  switch (c)
    {
    case '1':
      rc = 1;
      break;
    case '2':
      rc = 2;
      break;
 
    case '3':
      rc = 3;
      break;
 
    case '4':
      rc = 4;
      break;
    case '5':
      rc = 5;
      break;
 
    case '6':
      rc = 6;
      break;
    case '7':
      rc = 7;
      break;
    case '8':
      rc = 8;
      break;
 
    case '9':
      rc = 9;
      break;
 
    case 'A':
    case 'a':
      rc = 10;
      break;
    case 'B':
    case 'b':
      rc = 11;
      break;
 
    case 'C':
    case 'c':
      rc = 12;
      break;
    case 'D':
    case 'd':
      rc = 13;
      break;
 
    case 'E':
    case 'e':
      rc = 14;
      break;
    case 'F':
    case 'f':
      rc = 15;
      break;
    default:
      rc = 0;
      break;
    }
 
  return(rc);
  }
 
/******************************************************************************/
/*                                                                            */
/* Function       : TranslateEscapes (STATIC)                                 */
/*                                                                            */
/* Description    : Translate the escape sequences induced by HTTP. The       */
/*                  sequences consist of %xx, where xx is a hex number.       */
/*                  We replace the % character with the actual character      */
/*                  (i.e., the one whose ASCII value is xx), and then         */
/*                  shift over the rest of the string to remove the xx.       */
/*                  This is done in-place.                                    */
/*                                                                            */
/******************************************************************************/
static void TranslateEscapes(char *Str)
  {
  char *NextEscape;
  char RealValue;
  int AsciiValue;
 
  NextEscape = strchr(Str, '%');
  while (NextEscape != NULL)
    {
    AsciiValue = (16 * HexVal(NextEscape[1])) + HexVal(NextEscape[2]);
    *NextEscape = (char) AsciiValue;
    memmove(&NextEscape[1], &NextEscape[3], strlen(&NextEscape[3]) + 1);
    NextEscape = strchr(&NextEscape[1], '%');
    }
  }
/******************************************************************************/
/* This function will output a message if an error occurs when attempting     */
/* to display a HTML page.                                                    */
/******************************************************************************/
int ErrMsg (char *msg)
  {
  printf("Content-type: text/html\n\n");
  printf("<html>\n");
  printf("<head>\n");
  printf("<title>Error</title>\n");
  printf("</head>\n");
  printf("<body>\n");
  printf("<h1>Error</h1>\n");
  printf("<hr>\n");
  printf("<p>\n");
  printf("An error occurred in the CGI program.\n");
  printf("The specific error message is shown below:\n");
  printf("<p>\n");
  printf("<pre>%s</pre>\n<p>", msg);
  printf("<p>\n");
  printf("<hr>\n");
  printf("</body>\n");
  printf("</html>\n");
  return(TRUE);
  }

Protecting Your Programs

Storing your programs in the CGI-BIN subdirectory provides some level of protection because typical users do not have access to it. In most cases, this is sufficient. However, there are some devious users out there who can figure out how to use your CGI program to "break into" your server.

You can guard against this by understanding the limitations of your code. For instance, if a Perl program encounters an escape character as input, it will abort and dump the user out to the machine's root directory. If you don't compensate for this, a user could send escape characters in the URL-encoded data as input and gain access to your server. The easiest way to guard against this is by using compiled programs instead of scripts. You can use protection setups and ACL files to protect your programs as well. See Chapter 7. "Protecting Your Server".


Environment Variables

Before writing your CGI program, you need to understand the format in which the server will pass the data. The server receives the URL-encoded information and, depending on the type of request, passes the information to the CGI program using environment variables, command line arguments, or standard input.

For all requests, regardless of type, certain information is passed using the following environment variables:

AUTH_TYPE
If the server supports client authentication and the script is protected, this environment variable contains the method used to authenticate the client. For example:
Basic

CONTENT_LENGTH
When information is sent with the method of POST, this variable contains the number of characters of data. Servers typically do not send an end-of-file flag when they forward the information using stdin. If needed, you can use the CONTENT_LENGTH value to determine the end of the input string. For example:
7034

CONTENT_TYPE
When information is sent with the method of POST, this variable contains the type of data included. You can create your own content type in the server configuration file and map it to a viewer. For example:
Application/x-www-form-urlencoded

GATEWAY_INTERFACE
Contains the version of CGI that the server is using. For example:
CGI/1.1

HTTP_USER_AGENT
Contains the name of your browser. For example:
IBM WebExplorer dll /v1.03

HTTP_ACCEPT
Contains the list of MIME types the browser accepts. For example:
text/html

PATH_INFO
Contains the additional path information as sent by the Web browser. For example:
/ballyhoo

PATH_TRANSLATED
Contains the decoded or translated version of the path information contained in PATH_INFO. For example:
d:/wwwhome/ballyhoo

QUERY_STRING
When information is sent using a method of GET, this variable contains the information in a query that follows the ?. This information must be decoded by the CGI program. For example:
NAME=Eugene+T%2E+Fox&ADDR=etfox%7Cibm.net&INTEREST=xyz

REFERER_URL
The last URL location of the browser. For example:
http://www.acme.com/homepage

REMOTE_ADDR
Contains the IP address of the Web browser, if available. For example:
9.23.06.8

REMOTE_HOST
Contains the host name of the Web browser, if available. For example:
raleigh.ibm.com

REMOTE_IDENT
Contains the user ID of the remote user. For example:
Jillx

REMOTE_USER
If the server supports client authentication and the script is protected, this environment variable contains the username passed for authentication. For example:
password

REQUEST_METHOD
Contains the method (as specified with the METHOD attribute in an HTML form) used to send the request. For example:
GET or POST

SERVER_NAME
Contains the server host name or IP address of the server. For example:
www.ibm.com

SERVER_PORT
Contains the port number to which the client request was sent. For example:
80

SERVER_PROTOCOL
Contains the name and version of the protocol used to make the request. For example:
HTTP/1.0

SERVER_SOFTWARE
Contains the name and version of the server. For example:
Internet Connection Server/1.0

Requests from Standard Search (ISINDEX) Documents

ISINDEX is an HTML tag that identifies the document as a standard search document and causes the browser to automatically generate an entry field. When information is sent from an ISINDEX document, the server takes the appended data (the information following the ?), breaks it at the pluses (+), and sends the data to the CGI program as command line arguments (argv). For example:

<ISINDEX>

Note: It is possible to write CGI scripts that display all environment variables. At times these variables may include sensitive data such as user IDs and passwords for various products. So you must be careful about displaying environment variables in your CGI scripts and you must be careful about who has access to them.


[ Top of Page | Previous Page | Next Page | Table of Contents ]